Policy Shaping: Integrating Human Feedback with Reinforcement Learning

نویسندگان

  • Shane Griffith
  • Kaushik Subramanian
  • Jonathan Scholz
  • Charles Lee Isbell
  • Andrea Lockerd Thomaz
چکیده

A long term goal of Interactive Reinforcement Learning is to incorporate nonexpert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Policy Shaping from Simulated Critique in Domains with Multiple Optimal Policies

In many domains, there exist multiple ways for an agent to achieve optimal performance. Feedback may be provided along one or more of them to aid learning. In this work, we evaluate the interactive reinforcement learning algorithm Policy Shaping in domains with multiple optimal policies. We codify different feedback strategies as automated oracles and analyze their effect on the agent’s learnin...

متن کامل

Policy Shaping in Domains with Multiple Optimal Policies: (Extended Abstract)

In many domains, there exist multiple ways for an agent to achieve optimal performance. Feedback may be provided along one or more of them to aid learning. In this work, we investigate whether humans have a preference towards providing feedback along one optimal policy over the other in two gridworld domains. We find that for the domain with significant risk to exploration, 60% of our participa...

متن کامل

Combining manual feedback with subsequent MDP reward signals for reinforcement learning

As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that...

متن کامل

Human large-scale oscillatory brain activity during an operant shaping procedure.

The present study aimed at examining the oscillatory brain-electric correlates of human operant learning using high-density electroencephalography (EEG). Induced gamma-band activity (GBA) was studied using a fixed-interval reinforcement schedule with a variable limited hold period, which was decreased depending on response accuracy. Thus, participants' behavior was shaped during the course of t...

متن کامل

Masters Thesis: Shaping Methods to Accelerate Reinforcement Learning:

Reinforcement learning (RL) is an attractive solution for deriving an optimal control policy by on-line exploration of the control task. In reinforcement learning there is no need to specify how the task is to be achieved. In fact, RL is a way of programming the agents by specifying a reward function. At every time step, the controller (agent) receives the process (environment) state, takes an ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013